Multi-institution model (big model) versus single-institution model of knowledge-based volumetric modulated arc therapy (VMAT) planning for prostate cancer

We established a multi-institution model (big model) of knowledge-based treatment planning with over 500 treatment plans from five institutions in volumetric modulated arc therapy (VMAT) for prostate cancer. This study aimed to clarify the efficacy of using a large number of registered treatment plans for sharing the big model. The big model was created with 561 clinically approved VMAT plans for prostate cancer from five institutions (A: 150, B: 153, C: 49, D: 60, and E: 149) with different planning strategies. The dosimetric parameters of planning target volume (PTV), rectum, and bladder for two validation VMAT plans generated with the big model were compared with those from each institutional model (single-institution model). The goodness-of-fit of regression lines (R2 and χ2 values) and ratios of the outliers of Cook’s distance (CD) > 4.0, modified Z-score (mZ) > 3.5, studentized residual (SR) > 3.0, and areal difference of estimate (dA) > 3.0 for regression scatter plots in the big model and single-institution model were also evaluated. The mean ± standard deviation (SD) of dosimetric parameters were as follows (big model vs. single-institution model): 79.0 ± 1.6 vs. 78.7 ± 0.5 (D50) and 0.13 ± 0.06 vs. 0.13 ± 0.07 (Homogeneity Index) for the PTV; 6.6 ± 4.0 vs. 8.4 ± 3.6 (V90) and 32.4 ± 3.8 vs. 46.6 ± 15.4 (V50) for the rectum; and 13.8 ± 1.8 vs. 13.3 ± 4.3 (V90) and 39.9 ± 2.0 vs. 38.4 ± 5.2 (V50) for the bladder. The R2 values in the big model were 0.251 and 0.755 for rectum and bladder, respectively, which were comparable to those from each institution model. The respective χ2 values in the big model were 1.009 and 1.002, which were closer to 1.0 than those from each institution model. The ratios of the outliers in the big model were also comparable to those from each institution model. The big model could generate a comparable VMAT plan quality compared with each single-institution model and therefore could possibly be shared with other institutions.

Dose received by at least 2% of the volume D 50 Dose received by at least 50% of the volume D 95 Dose received by at least 95% of the volume D 98 Dose received by at least 98% of the volume V 90 Volume receiving 90% of the prescribed dose V 80 Volume receiving 80% of the prescribed dose V 50 Volume receiving 50% of the prescribed dose HI Homogeneity index GEDVH Geometrical dose volume histogram PCS Principal component score R 2 Coefficient of determination χ 2 Average chi squared CD Cook's distance mZ Modified Z-score SR Studentized residual dA Areal difference of estimate SD Standard deviation Intensity-modulated radiotherapy (IMRT) and volumetric modulated arc therapy (VMAT) treatment planning require trial and error during the optimization process to obtain an ideal dose distribution. The plan quality for IMRT and VMAT depends on the knowledge and experience of the planner or institution during optimization, which can cause large intra-and inter-institutional variability [1][2][3][4][5] , and sometimes even affect treatment outcomes 6 . RapidPlan (RP) (Varian Medical Systems, Palo Alto, CA, USA), a knowledge-based planning software, uses a model library containing the dose-volume histogram (DVH) of previous treatment plans. It automatically provides optimization objectives for future patients based on a trained model for VMAT planning. Previous studies concluded that RP with a single optimization could create clinically acceptable VMAT plans for prostate cancer, and could also reduce the optimization time independently of the planner's skill level and knowledge 7 . Furthermore, it was expected that RP would be shared among institutions and thereby standardize the plan quality between them [8][9][10][11] . However, sharing the single-institution model with multiple institutions remained a challenge, because RP depended on registered plans, including the planning strategies at each institution, such as the prescribing method to the targets and the dose constraint of the organs at risk (OARs) 12 .
Panettieri et al. attempted to share a model trained with 110 treatment plans from multiple institutions that had different irradiation methods (IMRT and VMAT), contouring, planning strategies, and prescription doses contributing to reducing the intra-and inter-institutional variability 13 . However, all the plans in the multi-institution model were standardized by achieving the DVH constraints of their group. Therefore, the sharing of their multi-institution model was limited to the institutions that had the different planning strategies and experience.
We hypothesized that the model with a large number of plans could be applied to the various planning strategies. To examine this, we established and evaluated a multi-institution model (big model) that aggregated over 500 treatment plans from five institutions with different planning strategies and constraints for targets and OARs in prostate cancer VMAT. In this study, we compared the big model with each institutional model (singleinstitution model) by using the dosimetric parameters of the planning target volume (PTV), rectum, and bladder for two validation VMAT plans. The efficacy of the big model, including the large number of registered treatment plans, and the potential to reduce the inter-variability of the plan quality were clarified to be able to share it.

Institutions and plan design. Five institutions (A-E) that treated prostate cancer cases with VMAT in
Japan were enrolled. The definition of gross tumor volume, the margins defining the clinical target volume (CTV) and PTV in each direction, and the dose constraints have been described in a previous study 12,14 . Table 1 shows the dose constraints used by each institution. The five institutions had different planning strategies. All methods were performed in accordance with the relevant guideline.
Development of the single-institution model and the big model. An RP model is a mathematical model that uses knowledge from the included treatment plans to generate the estimated DVH and estimatebased objectives in the optimization process. The RP algorithm was explained in detail by Fogliata et al. 15 . The single-institution model and big model for RP were created using the prostate VMAT plans for clinical use at each institution. The number of single-institution models of registered cases in institutions A, B, C, D, and E were 123, 53, 20, 60, and 100, respectively. To build the big model, 561 approved clinical plans, including 150 from A, 153 from B, 49 from C, 60 from D, and 149 from E, were anonymized and submitted by each institution. These clinical plans were used at each institution from April 2017 to April 2019. The clinical plans used to configure the single-institution model were also registered in the big model, and the outliers were not excluded. Validation of each model. Two sets of computed tomography (CT) data and structures (cases I and II) used at institution B were anonymized and delivered to other institutions. CT image thickness was 2.5 mm and the field of view was 50 cm. The target and OARs were contoured by a radiation oncologist according to the www.nature.com/scientificreports/ protocol of institution B. The bladder volume was 83.8 cm 3 in case I and 181.8 cm 3 in case II. The planners who had sufficient experience with using RapidPlan at each institution calculated the dose distributions with the single-institution model and big model using Eclipse ver. 13.0 or 15.6 (Varian). The objective settings for the big model, as shown in Table 2, were the same as the settings of the single-institution model of each institution. To evaluate the dose distributions calculated with the single-institution model and big model, the minimum dose (in Gy) to 2%, 50%, 95%, and 98% (D 2 , D 50 , D 95 , and D 98 ) of the PTV and the volume ratio receiving 90%, 80%, and 50% of the prescribed dose (V 90 , V 80 , and V 50 ) for the rectum and bladder were calculated in two cases. The homogeneity index (HI; defined as HI = [D 2 -D 98 ]/D 50 ) was calculated. In this study, a dose prescription of 78 Gy (in 39 fractions) was used for the calculation. The differences of dosimetric parameters between the singleinstitution model (D s ) and big model (D b ) were calculated as follows: Model analysis. In RP, the principal component analysis between geometrical dose-volume histogram (GEDVH) and actual DVH was performed. The regression model with the principal component score (PCS) of GEDVH and DVH was used to estimate the ideal DVH for a new case, which indicated the performance of its estimation. The goodness-of-fit for the regression models, coefficient of determination (R 2 ), and average chi squared (χ 2 ) value were evaluated. The R 2 value ranges from 0 to 1, with a larger value indicating a better model fit. The χ 2 value closer to 1.0 provides more certainty that the quality of the regression model is good. In addition, to evaluate the outliers of the rectum and bladder in each model, the following four parameters were evaluated: Cook's distance (CD), modified Z-score (mZ), studentized residual (SR), and areal difference of estimate (dA). CD indicates the influential data points in a regression model. A high CD value has a significant effect on the regression line. The mZ value measures the difference of an individual geometric parameter from the median value in a training set and identifies geometric outliers. The SR value measures the difference of PCSs of the DVHs between the original data and the estimated data (e.g., first PCS of the original DVH versus first PCS of the estimated DVH), which reveals dosimetric outliers. The dA value indicates the difference between the estimated dose distribution and the actual one, and is essentially the difference between the estimated DVH curve and the actual DVH curve. www.nature.com/scientificreports/ To investigate whether each institution model's and big model's training sets covered the geometrical characteristics of cases I and II, such as targets, rectum, and bladder, we investigated whether the following parameters were within the threshold of two standard deviations from the median of the training set: target and OAR volumes, OAR out-of-field volume percentage, OAR overlap volume percentage to target, and geometric distribution PCS. A more detailed description of the RP and DVH estimation algorithm can be found in reference 16

Results
Dosimetric parameters for the PTV, rectum, and bladder. Table 3 shows the mean and standard deviation (SD) values of dosimetric parameters for the PTV, rectum, and bladder that were calculated with each single-institution model and big model. There were no significant differences in the dosimetric parameters (P > 0.05) between each single-institution model and the big model. In the rectum, all averages of dosimetric parameters with the big model were lower than those with the single-institution models. An average difference of more than 10% was observed in V 50 for the rectum for each case. For the PTV, there were similar SD values between the single-institution models and big model. However, for both the rectum and bladder V 50 , the big model had lower SD values compared with those for the single-institution model in each case. Figure 1 shows the dosimetric parameter differences for the PTV, rectum, and bladder between the singleinstitution models and big model in each case. For the PTV, there were small differences between the singleinstitution models and big model. The maximum difference in D 95 for the PTV among institutions was 3.9 Gy in institution D. Dosimetric parameters for the rectum calculated with the big model were lower than those calculated with the single-institution model. The maximum difference in V 50 between the big model and singleinstitution model was 37.2% in institution D. The maximum differences among institutions for the single-institution model and big model were 9.3% and 10.2% for V 90 , 4.4% and 8.6% for V 80 , and 37.3% and 10.5% for V 50 , respectively. For V 50 , the big model was able to reduce the difference between each institution compared with each single-institution model. However, for both V 90 and V 80 , the big model could not reduce the differences between each institution compared with each single-institution model. In the bladder, the dosimetric parameters calculated with the big model were lower than or equivalent to those calculated with the single-institution model, except for institution D. The maximum differences among institutions for the single-institution model and big Model analytics. Table 4 shows R 2 and χ 2 values of regression models in each model. The R 2 value calculated from regression lines between PCSs of DVH and GEDVH for the big model was comparable to those from each model. The χ 2 value for the big model was the closest to 1.0 compared with each single-institution model. Table 5 shows the ratio and number of outliers for each index, such as CD > 4.0 17 , mZ > 3.5, SR > 3.0, and dA > 3.0 18 , for the rectum and bladder in the training data for each model. The ratio and number of outliers in the big model were comparable to those from each single-institution model. The big model and single-institution A model covered all geometrical characteristics of cases I and II, while other single-institution models did not cover any geometric data for case I and II as follows: institution B model: out-of-field volume percentage of the bladder; institution C model: bladder volume, overlap volume between target and OARs, and geometric distribution PCS of OARs; institution D model: outof-field volume percentage of the bladder, overlap volume between target and the rectum, target volume, and geometric distribution PCS of the bladder; institution E model: target volume.

Discussion
In this study, the multi-institution model (big model) was developed with 561 VMAT plans from five institutions with different planning strategies for prostate cancer. We evaluated the dose parameters of the VMAT plans generated with this big model. The big model could generate better or comparable dosimetric parameters compared with each single-institution model. The dosimetric parameters of the OARs were improved, especially V 50 , which can prevent radiation toxicity from occurring in the rectum and bladder during treatment 19 . Additionally, it can maintain coverage for the PTV and reduce inter-institution variation in the OARs.
The dose coverage of the PTV for the VMAT plan with the big model was comparable to the single-institution model, as shown in Table 3. It reflected the planning strategies of each institution, even though each institution used different prescribing methods. The original objective for the PTV at each institution in Table 2 could reflect the planning strategy of the VMAT plans with the big model. Thus, the big model could be used for several institutions by setting the PTV objectives for each institution's planning strategy. Moreover, the VMAT plans with the big model could reduce the doses to the rectum at all institutions, as well as to the bladder at all institutions except for institution D, compared with the single-institution model in Fig. 1, although there were no significant differences. This is because the big model has a wide range of geometrical information from the 561 plans and thus could cover any geometrical characteristics of the patients. The geometric characteristics of cases I and II were out of the range in all single-institution models except for institution A. This indicates that the estimation accuracy of those models could potentially deteriorate, while the big model covered the anatomical characteristics of those cases. Tol et al. noted that the wide range of anatomical information in the RP model was important for generating better plan quality compared with the clinical plans 20 . The line objectives along the DVH lower bounds were also useful for optimizing the estimated DVHs predicted from the big model with the large number of combinations between anatomical and dosimetric characteristics of registered plans 15,21,22 . In the rectum, the big model could not reduce the differences in the V 90 and V 80 values between each institution compared with each single-institution model, as shown in Fig. 1. This is because the rectum V 90 and V 80 are areas that overlap with PTVs, and were affected by the different planning strategies of PTVs in each institution. www.nature.com/scientificreports/  Table 4. The ratios of outliers in the big model were also comparable to each single-institution in Table 5. These results indicate that the big model regression quality could be used in the same way as each single-institution model without the impact of outliers previously seen in other studies 11,23 .
The sharing of one RP model among multiple institutions can reduce the inter-institution variations from the reduction of SD values, as shown in Table 3, leading to standardization 14 . A previous study noted that an RP model is difficult to share among other institutions because of different planning strategies 12 . Our big model, as described in the current study, can cover any combination between anatomical and dosimetric characteristics www.nature.com/scientificreports/ based on the large number of plans, which can possibly overcome this issue. Therefore, sharing the big model generated from more plans found worldwide should realize the standardization of plan quality at any institution.
For example, at a new institution, the planners will use the optimization parameters predefined by the big model, and then, they may customize those or use their own parameters in the case where those plans do not meet the dose criteria and/or planning strategy at that institution. The KBP can also serve as a training tool for the planners and institutions to implement the manual optimization 14 . One limitation is that this study included only two cases for evaluation, however: those were familiar prostate cancer cases; a study was performed to compare the dosimetric performance of the KBP models among five institutes 12 and another one was used to evaluate whether the KBP models could improve dosimetric performance over the treatment period 14 . It is necessary to investigate more cases for various sites. A big model like the one presented here might also be applied to stereotactic radiotherapy because of its simple anatomical characteristics, while further study is needed for complicated anatomical cases such as head and neck cancer. The mechanical performance and delivery accuracy of the plans generated with the big model should also be verified before clinical use 24 .

Conclusions
The big model, trained with over 500 clinical plans from multiple institutions with various planning strategies for targets and OARs in prostate cancer, could generate a superior or comparable plan quality compared with the VMAT plans generated with the single-institution models. Our work suggests a potential for plan quality standardization and reduction of inter-institution variability by using the big model.

Data availability
The dataset used and/or analyzed during the current study are available from the corresponding author on reasonable request.   www.nature.com/scientificreports/ Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.